System and method for multi-row decoding of video with dependent rows

ABSTRACT

A system and method for decoding digital video by processing multiple regions of an image in parallel, even when there are dependencies between rows in the image, are disclosed. The method generally involves decoding multiple rows concurrently, with the start of decoding of a given row being delayed until portions of the other rows on which the given row depends have been decoded. The system generally comprises parallel processors, with one processor typically decoding one row and another processor typically decoding the row above it. In accordance with the present invention, however, any number or type of processors can decode, or perform decoding functions on, the image in parallel.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application makes reference to, and claims priority to and the benefit of, U.S. provisional application Serial No. 60/370,716 filed Apr. 8, 2002.

INCORPORATION BY REFERENCE

[0002] The above-referenced U.S. provisional application Serial No. 60/370,716 is hereby incorporated herein by reference in its entirety.

FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0003] [Not Applicable]

MICROFICHE/COPYRIGHT REFERENCE

[0004] [Not Applicable]

FIELD OF THE INVENTION

[0005] The present invention relates generally to video decoding systems, and more particularly to an improved video decoding system and method that supports processing of multiple regions of an image in parallel, even when there are interdependencies between rows of the image.

BACKGROUND OF THE INVENTION

[0006] There are many major challenges in the field of digital video decompression. The design of decompression engines, referred to here as decoders, becomes more difficult as the compression formats and algorithms become more complex, and as the size or resolution of the images increases and as the bit rates increase. Traditionally, video decoders decode the compressed bit stream sequentially, in the order that the data is received, without parallel processing of multiple rows. For some video formats, picture sizes and bit rates, this traditional method may require extremely high levels of performance that may be either impractical or very expensive.

[0007] Some existing video decoders improve performance by decoding multiple rows of video in parallel, using parallel processors. In other words, multiple rows are decoded in parallel, with the start of decoding of each row being as early as the bit stream and a decode processor are available, and proceeding at whatever rate the processor delivers. A row is a set of macroblocks, typically 16×16 pixels each, which span the image from left to right, with a height of one macroblock. Some existing MPEG-2 HD (high definition) MPEG decoders decode 2 rows in parallel. This is possible because of a few attributes of the MPEG-2 video bit stream format, notably (a) each row has a start code that the decoder can find without necessarily decoding the row before or above it, and (b) each row can be decoded independently of the other rows in the same picture.

[0008] Some newer video compression formats, such as the Joint Video Team (JVT) project of ISO-MPEG and ITU-VCEG, provide much better compression than MPEG-2, in terms of compressed bit rate for a given level of quality. This makes them attractive for commercial deployment. However, they are also much more complex to encode and decode than MPEG-2. There is a desire to have a practical, low cost decoder that can decode JVT video at HD resolutions and bit rates. Since JVT is much more complex to decode than MPEG-2, the motivation to find a way to implement a decoder with parallel operations is even greater than that for MPEG-2.

[0009] However, JVT video does not have the second property of MPEG-2 video listed above; that is, each row cannot be decoded independently of the other rows in the picture. On the contrary, each row typically depends upon the results of decoding some portions of other rows in the same picture. This makes it difficult to design a decoder that can decode more than one row in parallel. Some experts believe it is impossible to decode multiple portions of a picture concurrently using the JVT format, due to the interdependency between rows.

[0010] Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.

SUMMARY OF THE INVENTION

[0011] Aspects of the present invention may be found in a system for decoding, in parallel, more than one row of video data. The system may comprise two or more processors that perform an identical decoding function, each on macroblocks in respective rows of input video data. In one embodiment, the decoding function comprises a complete decoding of macroblocks, while in another embodiment, the decoding function comprises entropy decoding, for example.

[0012] Each of the processors waits to perform the decoding function on a given macroblock in its respective row until one or more other processors, working on one or more other rows, has completed performance of the decoding functions on which the given macroblock depends on all predecessor macroblocks for the given macroblock. In some cases, a processor also waits to perform the decoding function on the given macroblock until it has performed the decoding function on a further predecessor macroblock for the given macroblock that is located in its respective row.

[0013] Further aspects of the present invention may be found in a method where a macroblock is received; a determination is made that a decoding function has been completed with respect to each of one or more predecessor macroblocks for the received macroblock; and the decoding function is then performed on the received macroblock in response to the determination.

[0014] Additional aspects of the present invention may be found in another method, where a macroblock is received, and a decoding function is performed on the received macroblock only after decoding functions on which the decoding function depends has been completed on each of one or more predecessor macroblocks for the received macroblock.

[0015] Each of these methods may also involve the identification of the predecessor macroblocks.

[0016] These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017]FIG. 1 depicts one illustrative embodiment of a generic video decoding system that may be used in connection with the present invention.

[0018]FIG. 2 depicts a representative sample of 5 macroblocks from each of 2 rows of video data, for illustrating exemplary aspects of the present invention.

[0019]FIG. 3 is a flow diagram of a method that may be performed by a decoding system that has two or more processors that are capable of decoding video in parallel, in accordance with one embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0020] The following description is made with reference to the appended figures.

[0021]FIG. 1 depicts one illustrative embodiment of a generic video decoding system that may be used in connection with the present invention. Video decoding system 100 generally comprises a plurality of processors that may perform decoding operations in parallel on more than one row of compressed video data.

[0022] Specifically, the video decoding system 100 generally comprises a decode control processor 101 and two or more video decode processors 103. The decode control processor 101 may be a considered a central control unit of the decoding system 100, that manages decoding operations in the system, including the video decode processors 103. The decode control processor 101 may be a RISC processor, for example, and may prepare input video data for decoding, may fetch required data from decoder memory 105, and may itself perform some data processing on the input video data.

[0023] In one embodiment, each of the video decode processors 103 is capable of decoding, on its own, macroblocks of video data, under the control of the decode control processor 101. Thus, in this embodiment, the video decode processors 103 operate in parallel, each on a separate row of input video data, for example.

[0024] In another embodiment, instead of being capable of decoding macroblocks of data entirely on their own, the video decode processors 103 may be any module, engine or accelerator, for example, that performs a decoding function on a macroblock of video data. Such functions may include, for example, entropy decoding, deblocking, inverse transform functionality, inverse quantization, motion compensation, etc. In other words, the video decode processors 103 may be any module, engine or accelerator that could assist the decode control processor 101 by off-loading decoding tasks that may otherwise be a bottleneck in real-time video decoding systems if handled by the decode control processor 101 alone.

[0025] In this latter embodiment, two or more of the video decode processors 103 perform critical functions, such as entropy decoding, for example. These critical processors perform the same decoding function in parallel on two or more rows of video data. Each of the remainder of the processors 103 (i.e., those that are not duplicated) performs another decoding function, such as mentioned above, also on two or more rows of video data, but sequentially—first on a macroblock of one row and then on a macroblock on another row, and so on.

[0026] Additional detail regarding an implementation of this latter embodiment of the video decoding system of FIG. 1, capable of performing dual row decoding, for example, may be found in U.S. application Ser. No. ______ entitled “Method Of Operating A Video Decoding System ” filed May 1, 2002 (Attorney Docket No. 13305US01), which application is incorporated herein by reference in its entirety.

[0027] While specific decoding systems are discussed above, however, it should be understood that any type of system that performs decoding operations in parallel on more than one row of compressed video data may benefit from, and falls within the scope of, the present invention.

[0028]FIG. 2 depicts a representative sample 200 of 5 macroblocks from each of 2 rows 201 and 203 of video data, for illustrating exemplary aspects of the present invention. For purposes of this example, macroblocks are indicated by the letter B. The numbers in each box (m, n) refer to row (m) and horizontal position (n) of each macroblock. In the example in FIG. 2, B(2,2) is dependent on B(1,1), B(1,2), B(1,3), and B(2,1). These dependencies are indicated by arrows. In this context, dependency means that a macroblock cannot be decoded properly until the macroblocks on which it depends have been decoded. The macroblocks on which a given macroblock depends are called its predecessors. The dependency is an artifact of the video stream being decoded, not of the decoder design.

[0029] In the example of FIG. 2, the decoding of B(2,2) does not begin until the decoding of its predecessors B(1,1), B(1,2), B(1.3), and B(2,1) is complete. Once these conditions are met, the decoding of B(2,2) can proceed, while the decoding of B(1,4), B(1,5) etc may proceed in parallel. In other embodiments using pipelined decoding processes, the decoding of B(2,2) could start before the decoding of the predecessor macroblocks is complete, however each stage in the decoding pipeline does not begin until the decoding stages on which it depends are completed in the predecessor macroblocks. In general, the terms “decoding” may refer to either pipelined or non-pipelined decoding systems, and “complete” or “decoded” refers to either finishing decoding of a macroblock, or finishing decoding of those pipeline stages of a macroblock on which each pipeline stage of the current macroblock depends.

[0030] Assuming B(2,1) depends only on B(1,1) and B(1,2), which is typical if B(2,1) is the first macroblock in the row, the decoding of B(2,1) can begin as soon as B(1,1) and B(1,2) are decoded.

[0031] In typical operation, decoding begins with the first row. Typically, the first row does not depend on any other row in the picture, that is, it has no predecessors in the same picture, so decoding of the first row can start as soon as its dependencies are met, including the availability of the bit stream to be decoded, the availability of a decoding processor to perform the task, and other pictures upon which decoding of this picture depend have been decoded. Decoding of the first row may proceed at any rate of which the processor is capable; it does not need to wait for any other processing to be completed.

[0032] Once the predecessors of B(2,1), the first macroblock of the second row, have been decoded, a second processor can begin decoding the second row. Decoding of successive macroblocks in the second row is paced by the completion of the predecessor macroblocks for each of the macroblocks in the second row. In general, the second row can be decoded at the same rate as the first row, only the start of decoding is delayed. In the example of FIG. 2, decoding the second row is delayed by 2 macroblocks with respect to the first row. If the video format had a different set of predecessors, the delay might be different.

[0033] Additional rows can be decoded in parallel if desired. A third row can be decoded in parallel, again with the start of decoding of the row waiting until the predecessors of its first macroblock have been decoded. Again, the decoding of each successive macroblock is paced by the completion of the respective predecessors of each macroblock. Using the same stream characteristics illustrated in FIG. 2, decoding of the third row is typically delayed by 2 macroblocks with respect to the second row. The same applies to a fourth or additional rows.

[0034] The processors (e.g., video decode processors 103 of FIG. 1) can communicate about the completion of predecessor macroblocks in a number of ways. In one illustrative embodiment, each processor has a register indicating the number of the macroblock most recently decoded, and this register is read by the other processors. In another embodiment using a pipelined decoding system, each processor has a register indicating the number of the macroblock most recently decoded and the pipeline stage most recently completed of the macroblock most recently started. As such, the other processors can readily determine when predecessor macroblocks or selected stages of them are complete. In another embodiment, each processor writes the number of the most recently complete macroblock to a location in memory (e.g., decoder memory 105 of FIG. 1) shared by a control processor (e.g., decode control processor 101 of FIG. 1) that controls all decoding operations. In another embodiment, each processor also writes the pipeline stage most recently completed of the macroblock most recently started. Of course, many other implementations are also contemplated and fall within the scope of the present invention.

[0035]FIG. 3 is a flow diagram of a method that may be performed by a decoding system that has two or more processors capable of decoding video in parallel, in accordance with one embodiment of the present invention. Method 300 illustrated in FIG. 3 operates generally as follows. In an illustrative embodiment, as discussed above, a core processor (e.g., decode control processor 101 of FIG. 1) controls the operation of parallel decode processors (e.g., video decode processors 103 of FIG. 1). The core processor selects a row for each of one or more of the decode processors to decode (see reference numeral 301 of FIG. 3). The method of selection of rows to decode typically depends on the details of the video format. For example, MPEG-2 video includes slice start codes, with a unique start code prefix, every row. As in prior art methods of parallel multi-row decoding, an element in the video decoder, such as the core processor or a stream pre-processor, can identify the locations of all rows before they are decoded. Other formats, including those that have inter-row dependencies, may have similar start codes, or other methods such as a network adaptation layer that packages rows or slices in network packets, or syntax indicating the length of each slice or row. It should be understood that the use of the term “row” herein is intended to generically cover both the term row as well as the term slice.

[0036] Referring again to FIG. 3, the start of decoding of each picture does not begin until the dependencies of that picture are met (see reference numeral 303). Typical dependencies of a picture include the decoding of previous pictures, and the availability of the compressed data to be decoded.

[0037] The start of decoding of each macroblock (see reference numeral 305), starting with the first macroblock in each row, does not begin until its predecessor macroblocks have been decoded (see reference numeral 307). Which macroblocks are predecessors is a function of the video data format. An example is illustrated in FIG. 2. The set of dependencies may vary from picture to picture in a given video stream. Once decoding of one macroblock has begun, the decoding of the next macroblock may begin if all of its predecessors have been decoded. In some embodiments the decoding of a macroblock may be pipelined, such that the one macroblock may not be fully decoded before the next macroblock is started. In such embodiments, each stage of decoding may begin as soon as all its dependencies have been met; dependencies typically include among other things the completion of those pipeline stages of predecessor macroblocks on which the current pipeline stage depends. Decoding of a macroblock may be delayed until the previous macroblock has been completed, depending on the dependencies of the macroblock as specified by the video format.

[0038] The process continues as long as desired. In a typical embodiment such as shown in FIG. 3, this process continues until the decoding of a row has been completed (see reference numerals 309 and 311). In other embodiments, one decode processor could decode less than a row; for example another processor could decode another part of the same row in parallel. In another embodiment, the process continues beyond the end of one row. For example one decode processor could decode multiple rows sequentially, while another decode processor could decode a different set of rows.

[0039] Aspects of the present invention therefore allow the construction of practical decoders of digital video, where the format of the video involves dependencies between rows, and where the computational complexity of the decoding algorithm is very high, such that a purely sequential decoding method may not be practical or cost-effective.

[0040] Many modifications and variations of the present invention are possible in light of the above teachings. Thus, it is to be understood that, within the scope of the appended claims, the invention may be practiced otherwise than as described hereinabove. 

1. A system for decoding, in parallel, more than one row of video data comprising: a first processor that performs a decoding function on at least a partial first row of video data, the at least a partial first row of video data comprising a first plurality of macroblocks; and a second processor that performs the decoding function on at least a partial second row of video data, the at least a partial second row of video data comprising a second plurality of macroblocks, wherein the second processor begins performing the decoding function on each one of the second plurality of macroblocks only after the first processor completes performance of at least one of (i) the decoding function and (ii) at least one further decoding function on which the decoding function in the second processor depends, on select ones of at least the first plurality of macroblocks upon which the each one of the second plurality of macroblocks depends.
 2. The system of claim 1 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 3. The system of claim 1 wherein the decoding function comprises entropy decoding.
 4. The system of claim 1 wherein the select ones of at least the first plurality of macroblocks comprise a plurality of predecessor macroblocks.
 5. The system of claim 1 wherein the select ones of at least the first plurality of macroblocks comprises the first plurality of macroblocks and at least one of the second plurality of macroblocks.
 6. The system of claim 1 wherein the at least a partial second row of video data comprises a complete second row, and the second plurality of macroblocks comprises every macroblock in the complete second row.
 7. A system for decoding, in parallel, more than one row of video data comprising: a first processor that performs a decoding function on at least one predecessor macroblock in a first row of video data; and a second processor that begins performing the decoding function on a macroblock in a second row of video data only after the first processor completes performance of at least one of (i) the decoding function and (ii) at least one further decoding function on which the decoding function in the second processor depends, on the at least one predecessor macroblock in the first row of video data.
 8. The system of claim 7 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 9. The system of claim 7 wherein the decoding function comprises entropy decoding.
 10. The system of claim 7 wherein the second processor begins performing the decoding function on a macroblock in the second row of video data only after the second processor completes performance of at least one of the decoding function and at least one further decoding function on which the decoding function depends, on at least one predecessor macroblock in the second row.
 11. A system for decoding, in parallel, more than one row of video data comprising a plurality of processors, each of the plurality of processors performing an identical decoding function on macroblocks of a different respective row of input video data, each of the plurality of processors waiting to perform the decoding function on a given macroblock in its respective row until at least one other of the plurality of processors has completed performance of at least one of (i) the decoding function and (ii) at least one further decoding function on which the decoding function in the each of the plurality of processors depends, on all predecessor macroblocks for the given macroblock.
 12. The system of claim 11 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 13. The system of claim 1 I wherein the decoding function comprises entropy decoding.
 14. The system of claim 11 wherein each of the plurality of processors begins performing the decoding function on a second macroblock in its respective row only after it has completed performance of at least one of the decoding function and at least one further decoding function on which the decoding function in the each processor depends, on at least one predecessor macroblock, for the given macroblock, in its respective row.
 15. A method of decoding, in parallel, more than one row of video data comprising: receiving a macroblock; determining that a decoding function has been completed with respect to each of a plurality of predecessor macroblocks for the received macroblock; and performing at least one of (i) the decoding function and (ii) at least one further decoding function that depends on the completion of the decoding function, on the received macroblock in response to the determining.
 16. The method of claim 15 further comprising identifying the plurality of predecessor macroblocks.
 17. The system of claim 15 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 18. The method of claim 15 wherein the decoding function comprises entropy decoding.
 19. A method of decoding, in parallel, more than one row of video data comprising: receiving a macroblock; completing the performance of a decoding function on each of a plurality of predecessor macroblocks for the received macroblock; and performing at least one of (i) the decoding function and (ii) at least one further decoding function that depends on the completion of the decoding function, on the received macroblock only after the completing.
 20. The method of claim 19 further comprising identifying the plurality of predecessor macroblocks.
 21. The system of claim 19 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 22. The method of claim 19 wherein the decoding function comprises entropy decoding.
 23. A method of decoding, in parallel, more than one row of video data comprising: receiving, by a first processor, a macroblock; performing, by a second processor, a decoding function on each of a plurality of predecessor macroblocks for the received macroblock; indicating, by the second processor to the first processor, that the performing has been completed; and performing, by the first processor, at least one of (i) the decoding function and (ii) at least one further decoding function that depends on completion of performance of the decoding function, on the received macroblock after the indicating.
 24. The method of claim 23 further comprising identifying the plurality of predecessor macroblocks.
 25. The system of claim 23 wherein the decoding function comprises one of a complete decoding of a macroblock or a performance of one or more stages of a decoding pipeline.
 26. The method of claim 23 wherein the decoding function comprises entropy decoding.
 27. The method of claim 23 wherein the indicating comprises reading, by the first processor, a register of the second processor.
 28. The method of claim 23 wherein the indicating comprises writing, by the second processor, information regarding completion to a memory, and wherein a third processor accesses the memory and communicates completion information to the first processor. 